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Abstract 

In  this  paper,  we  report  the  experiments 
we  conducted  for  our  participation  to  the 
TREC  2012  Web  Track.  We  experimented 
a  brand  new  system  that  models  the  latent 
concepts  underlying  a  query.  We  use  La¬ 
tent  Dirichlet  Allocation  (LDA),  a  gener¬ 
ative  probabilistic  topic  model,  to  exhibit 
highly-specific  query-related  topics  from 
pseudo-relevant  feedback  documents.  We 
define  these  topics  as  the  latent  concepts 
of  the  user  query.  Our  approach  auto¬ 
matically  estimates  the  number  of  latent 
concepts  as  well  as  the  needed  amount 
of  feedback  documents,  without  any  prior 
training  step.  These  concepts  arc  incor¬ 
porated  into  the  ranking  function  with  the 
aim  of  promoting  documents  that  refer 
to  many  different  query-related  thematics. 

We  also  explored  the  use  of  different  types 
of  sources  of  information  for  modeling 
the  latent  concepts.  For  this  puipose,  we 
use  four  general  sources  of  information  of 
various  nature  (web,  news,  encyclopedic) 
from  which  the  feedback  documents  arc 
extracted. 

1  Introduction 

When  searching  for  a  specific  information,  users 
query  the  retrieval  system  with  a  list  of  keywords, 
a  question,  a  declarative  sentence  or  maybe  a 
long  description  of  the  search  topic.  However, 
this  often  does  not  fully  describe  the  user  infor¬ 
mation  need,  which  may  harm  retrieval  perfor¬ 
mance.  One  way  to  better  outline  the  topic  of 
the  search  without  the  help  of  the  user  is  to  enrich 
the  query  with  additional  information.  Such  query 


expansion  techniques  have  shown  to  significantly 
improve  the  effectiveness  of  retrieval  systems  in 
many  TREC  tracks  before. 

The  goal  of  the  work  presented  in  this  paper  is 
to  accurately  represent  the  underlying  core  con¬ 
cepts  involved  in  a  search  process,  hence  indi¬ 
rectly  improving  the  contextual  information  sur¬ 
rounding  this  search.  For  this  purpose,  we  in¬ 
troduce  an  unsupervised  framework  that  allows  to 
track  the  implicit  concepts  related  to  a  given  query 
and  improve  document  retrieval  effectiveness  by 
incorporating  these  concepts  to  the  initial  query. 
For  each  query,  latent  concepts  arc  extracted  from 
a  reduced  set  of  feedback  documents  initially  re¬ 
trieved  by  the  system.  These  feedback  documents 
can  come  from  the  target  collection  or  from  any 
other  textual  source  of  information. 

The  main  strength  of  our  approach  is  that  it 
is  entirely  unsupervised  and  does  not  require  any 
training  step.  The  number  of  needed  feedback 
documents  as  well  as  the  optimal  number  of  con¬ 
cepts  arc  automatically  estimated  at  query  time. 
We  emphasize  that  the  algorithms  have  no  prior 
information  about  these  concepts.  The  method  is 
also  entirely  independent  of  the  source  of  informa¬ 
tion  used  for  concept  modeling.  Queries  arc  not 
labelled  with  topics  or  keywords  and  we  do  not 
manually  fix  any  parameter  at  any  time,  except  the 
number  of  words  composing  the  concepts. 

2  Query- Oriented  LDA 

2.1  Latent  Dirichlet  Allocation 

Fatent  Dirichlet  Allocation  is  a  generative  proba¬ 
bilistic  topic  model  (Blei  et  al.,  2003).  The  under¬ 
lying  intuition  is  that  documents  exhibit  multiple 
topics,  where  a  topic  is  a  multinomial  distribution 
over  a  fixed  vocabulary  W.  The  goal  of  FDA  is 
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thus  to  automatically  discover  the  topics  from  a 
collection  of  documents.  The  documents  of  the 
collection  arc  modeled  as  mixtures  over  K  topics 
each  of  which  is  a  multinomial  distribution  over 
W.  Each  topic  multinomial  distribution  (f>y .  is  gen¬ 
erated  by  a  conjugate  Dirichlet  prior  with  parame¬ 
ter  (3,  while  each  document  multinomial  distribu¬ 
tion  6ci  is  generated  by  a  conjugate  Dirichlet  prior 
with  parameter  a.  Thus,  the  topic  proportions  for 
document  d  are  6,j,  and  the  word  distributions  for 
topic  k  are  (j)k.  In  other  words,  6,ik  is  the  prob¬ 
ability  of  topic  k  occurring  in  document  d  (i.e. 
P(k\d)).  Respectively,  <pkw  is  the  probability  of 
word  w  belonging  to  topic  k  (i.e.  P(w\k)).  Exact 
LDA  estimation  was  found  to  be  intractable  and 
several  approximations  have  been  developed  (Blei 
et  ah,  2003;  Griffiths  and  Steyvers,  2004).  We  use 
in  this  work  the  variational  approximation  algo¬ 
rithm  implemented  and  distributed  by  Pr.  Blei1 . 

Each  learned  multinomial  distribution  6k  is  tra¬ 
ditionally  presented  as  list  of  the  top  words  with 
the  higher  probabilities  for  topic  k.  Topics  can 
then  be  easily  identified  by  their  most  represen¬ 
tative  words. 


of  LDA’s  topics  contained  in  a  set  of  docu¬ 
ments  (Arun  et  ah,  2010;  Cao  et  ah,  2009).  Even 
though  they  differ  at  some  point,  they  follow  the 
same  idea  of  computing  similarities  (or  distances) 
between  pairs  of  topics  over  several  instances  of 
the  model,  while  varying  the  number  of  topics.  It 
comes  down  to  a  clustering  approach  which  delin¬ 
eates  the  different  clusters.  Here  the  clusters  arc 
the  topics  and  the  objective  is  to  maximize  the  dis¬ 
similarity  between  topics.  Iterations  arc  done  by 
varying  the  number  of  topics  of  the  LDA  model, 
then  estimating  again  the  Dirichlet  distributions. 
The  optimal  amount  of  topics  of  a  given  collection 
is  reached  when  the  overall  dissimilarity  between 
topics  achieves  its  maximum  value. 

We  perform  a  simple  heuristic  that  estimates  the 
number  of  latent  concepts  of  a  user  query  by  max¬ 
imizing  the  information  divergence  D  between  all 
pairs  (ki.  kj )  of  LDA’s  topics.  The  number  of  con¬ 
cepts  K  estimated  by  our  method  is  given  by  the 
following  formula: 

K  =  argmax  1 — —  V  D(^||^) 

*  K(K  ~  11 


2.2  Estimating  the  number  of  concepts 

There  can  be  a  numerous  amount  of  concepts  un¬ 
derlying  an  information  need.  Latent  Dirichlet  Al¬ 
location  allows  to  model  the  topic  distribution  of  a 
given  collection,  but  the  number  of  topics  is  a  fixed 
parameter.  However  we  can  not  know  in  advance 
the  number  of  concepts  that  are  related  to  a  given 
query.  We  propose  a  method  that  automatically 
estimates  the  number  of  latent  concepts  based  on 
their  word  distributions. 

Considering  LDA’s  topics  arc  constituted  of  the 
n  words  with  highest  probabilities,  we  define  an 
argmax[n]  operator  which  produces  the  top-n  ar¬ 
guments  that  obtain  the  n  largest  values  for  a  given 
function.  Using  this  operator,  we  obtain  the  set 
W/,:  of  the  n  words  that  have  the  highest  probabil¬ 
ities  P(w\k)  =  (j)k,w  in  topic  k: 

Wk  =  argmax[n]  (j>k:W 

W 

Latent  Dirichlet  Allocation  needs  a  given  num¬ 
ber  of  topics  in  order  to  estimate  topic  and  word 
distributions.  Several  approaches  has  been  stud¬ 
ied  for  automatically  finding  the  right  number 

'http : / /www . cs . princeton ,edu/~blei/ 
lda-c 


where  K  is  the  number  of  topics  given  as  a  pa¬ 
rameter  to  LDA,  and  T k  is  the  set  of  K  topics.  In 
other  words,  K  is  the  number  of  topics  for  which 
LDA  modeled  the  most  scattered  topics.  The 
Kullback-Leibler  divergence  measures  the  infor¬ 
mation  divergence  between  two  probability  distri¬ 
butions.  It  is  used  particularly  by  LDA  in  order  to 
minimize  topic  variation  between  two  expectation- 
maximization  iterations  (Blei  et  ah,  2003).  It  has 
also  been  widely  used  in  a  variety  of  fields  to  mea¬ 
sure  similarities  (or  dissimilarities)  between  word 
distributions  (AlSumait  et  al.,  2008).  Considering 
it  is  a  non-synmietric  measure  we  use  the  Jensen- 
Shannon  divergence,  which  is  the  symmetric  ver¬ 
sion  of  KL  divergence,  to  avoid  obvious  problems 
when  computing  divergences  between  all  pairs  of 
topics: 


D(ki\\kj) 


1 

2 

1 

2 


^2  P{Mki)  ^g 

wGW 


p(w\kj) 

p(w\kj) 


^2  P{w\kj)  log 

wew 


P(w\kj) 

p{w\ki) 


+ 


The  word  probabilities  for  given  topics  arc  ob¬ 
tained  from  the  multinomial  distributions  cj)k. 
Each  word  w  of  the  vocabulary  W  has  a  probabil¬ 
ity  of  belonging  to  the  topic  k,  which  is  expressed 


by  p(w\k)  =  4>k,w  The  final  outcome  is  the  op¬ 
timal  number  of  topics  K  and  its  associated  topic 
model.  The  resulting  T  k  M  set  of  topics  is  consid¬ 
ered  as  the  set  of  K  latent  concepts  modeled  from 
a  set  of  M  feedback  documents.  We  will  further 
refer  to  the  T  K  M  set  as  a  concept  model. 

The  number  of  relevant  documents  can  vary 
from  one  query  to  another,  hence  the  number  M 
of  feedback  documents  used  to  model  the  latent 
concepts  must  also  vary  for  each  query.  It  is 
also  highly  dependent  on  the  source  of  information 
from  which  the  feedback  documents  arc  extracted. 
We  propose  in  the  following  section  a  method 
for  automatically  choosing  the  right  amount  M 
of  feedback  documents  based  on  concept  models 
similarities. 

2.3  How  many  feedback  documents? 

An  obvious  problem  with  pseudo-relevance  feed¬ 
back  based  approaches  is  that  not-relevant  docu¬ 
ments  can  be  included  in  the  set  of  feedback  docu¬ 
ments.  This  problem  is  much  more  important  with 
our  approach  since  it  could  result  with  learned 
concepts  that  arc  not  related  to  the  initial  query. 
We  mainly  tackle  this  difficulty  by  reducing  the 
amount  of  feedback  documents.  Relevant  docu¬ 
ments  concentration  is  higher  in  the  top  ranks  of 
the  list.  Thus  one  simple  way  to  reduce  the  prob¬ 
ability  of  catching  noisy  feedback  documents  is  to 
reduce  their  overall  amount. 

However  an  arbitrary  number  can  not  be  fixed 
for  all  queries.  Some  information  needs  can  be 
satisfied  by  only  2  or  3  documents,  while  others 
may  require  15  or  20.  Thus  the  choice  of  the  feed¬ 
back  documents  amount  has  to  be  automatic  for 
each  query.  To  this  end,  we  compare  the  con¬ 
cept  models  generated  from  different  amounts  m 
of  feedback  documents.  To  avoid  noise,  we  favor 
the  concept  models  that  contain  concepts  that  are 
similar  to  others  in  other  models.  The  underlying 
assumption  is  that  all  the  feedback  documents  are 
essentially  dealing  with  the  same  topics,  no  mat¬ 
ter  if  they  are  5  or  20.  Concepts  that  are  likely 
to  appeal-  in  different  models  learned  from  various 
amounts  of  feedback  documents  are  certainly  re¬ 
lated  to  query,  while  noisy  concepts  are  not. 

We  estimate  the  similarity  between  two  concept 
models  by  computing  the  similarities  between  all 
pairs  of  concepts  of  the  two  models.  Consider¬ 
ing  that  two  concept  models  are  generated  based 
on  different  number  of  documents  (i.e.  different 


IZq  collections),  they  do  not  share  the  same  prob¬ 
abilistic  space.  Since  their  probability  distribution 
are  not  comparable,  computing  their  overall  sim¬ 
ilarity  can  be  done  solely  by  taking  the  concepts 
word  distributions  into  account.  We  treat  the  dif¬ 
ferent  concepts  as  bags  of  words  and  use  a  docu¬ 
ment  frequency -based  similarity  measure: 

sim(TK,m’TK,n )  = 

I  ki  Cl  k':  I  ,  N 

I;,  |  '  ^2  p(w\k)p{w\k)  log  — 

k^K,  m  k'^K,u  1  *'  ^  !W 

where  kt  Cl  kj\  is  the  number  of  words  the  two 
concepts  have  in  common,  dfw  is  the  document 
frequency  of  w  and  N  is  the  number  of  docu¬ 
ments  in  the  target  collection.  The  initial  purpose 
of  this  measure  was  to  track  novelty  (i.e.  mini¬ 
mize  similarity)  between  two  sentences  (Metzler 
et  al.,  2005),  which  is  precisely  our  goal,  except 
that  we  want  to  track  redundancy  (i.e.  maximize 
similarity)  while  taking  word  probabilities  inside 
the  topics  into  account. 

The  final  sum  of  similarities  between  each  con¬ 
cept  pairs  produces  an  overall  similarity  score  of 
the  current  concept  model  compared  to  all  other 
models.  Finally,  the  concept  model  that  maxi¬ 
mizes  this  overall  similarity  is  considered  as  the 
best  candidate  for  representing  the  implicit  con¬ 
cepts  of  the  query.  In  other  words,  we  consider 
the  top  M  feedback  documents  for  modeling  the 
concepts,  where 

M  =  argmax  ^  sim(Tk  m,Tk  J 

m 

n 

In  other  words,  for  each  query,  the  concept  model 
that  is  the  most  similar  to  all  other  learned  con¬ 
cept  models  is  considered  as  the  final  set  of  latent 
concepts  related  to  the  user  query. 

2.4  Concept  weighting 

We  previously  detailed  how  we  estimate  the  num¬ 
ber  of  concepts  and  the  number  of  feedback  docu¬ 
ments  from  which  they  are  extracted.  We  face  in 
this  section  the  problem  of  appropriately  weight¬ 
ing  these  concepts. 

User  queries  can  be  associated  with  a  number 
of  underlying  concepts  but  these  concepts  do  not 
have  the  same  importance.  For  example,  the  pre¬ 
vious  method  for  selecting  the  right  amount  of 
feedback  documents  could  still  yield  noisy  con¬ 
cepts,  and  some  concepts  may  also  be  barely  rel- 


evant.Hence  it  is  essential  to  emphasize  appropri¬ 
ate  concepts  and  to  depreciate  inappropriate  ones. 
One  effective  way  is  to  rank  these  concepts  and 
weigh  them  accordingly:  important  concepts  will 
be  weighted  higher,  thus  reflecting  their  impor¬ 
tance. 

Recent  studies  proposed  different  approaches  to 
rank  or  score  LDA  topics  (Alsumait  et  al.,  2009; 
Newman  et  al.,  2010;  Wen  and  Lin,  2010),  how¬ 
ever  . 

Finally,  the  score  5k  of  a  concept  k  with  respect 
to  its  overall  coherence  in  the  collection  is  given 
by: 

4  =  p(d\Q)p(k\d) 

d&nQ 

where  n  is  the  number  of  words  in  each  concept. 
The  probability  of  a  concept  k  appealing  in  docu¬ 
ment  d  is  given  by  the  multinomial  distribution  6 
previously  learned  by  LDA,  hence  p(k\d)  =  6d,k- 

Each  concept  is  weighted  with  respect  to  its  co¬ 
herence  in  the  target  collection,  but  the  actual  rep¬ 
resentation  of  the  concept  is  still  a  bag  of  words. 
These  words  are  the  core  components  of  the  con¬ 
cepts  and  intrinsically  do  not  have  the  same  im¬ 
portance.  The  easier  way  of  weighting  them  is 
to  use  their  probability  of  belonging  to  a  concept 
k  which  are  learned  by  Latent  Dirichlet  Alloca¬ 
tion  and  given  by  the  multinomial  distribution  <j>k- 
Probabilities  are  normalized  across  all  words,  the 
weight  of  word  w  in  concept  k  is  thus  computed 
as  follows: 

2  ^k,w 

( Pk,w  =  7 

Z^u/e Wfc  9k, w' 

Finally,  a  concept  learned  by  our  latent  concept 
modeling  approach  is  a  set  of  weighted  words  rep¬ 
resenting  a  facet  of  the  information  need  underly¬ 
ing  a  user  query.  The  concept  is  itself  weighted  to 
reflect  its  relative  importance  with  other  concepts. 

2.5  Document  ranking 

The  previous  subsections  were  all  about  modeling 
consistent  concepts  from  reliable  documents  and 
modeling  their  relative  influence.  Flere  we  detail 
how  these  concepts  can  be  integrated  in  a  retrieval 
model  in  order  to  improve  ad-hoc  document  rank¬ 
ing. 

There  are  several  ways  of  taking  conceptual  as¬ 
pects  into  account  when  ranking  documents.  Here, 
the  final  score  of  a  document  d  with  respect  to 
a  given  user  query  Q  is  determined  by  the  linear 


combination  of  query  word  matches  (standard  re¬ 
trieval)  and  latent  concepts  matches.  It  is  formally 
written  as  follows: 

s(Q,d)  =  P(d\Q)+  £  *  £ 

k£TK,M  wewk 

where  T  K  M  is  the  concept  model  that  holds  the 
latent  concepts  of  query  Q  (see  Section  2.4)  and 
5k  is  the  normalized  weight  of  concept  k: 


The  P{d\Q)  and  P{d\w)  probabilities  are  the 
likelihood  of  document  d  being  observed  given 
the  initial  query  Q  (respectively,  word  w).  In 
this  work  we  use  a  language  modeling  approach 
to  retrieval  (Lavrenko  and  Croft,  2001),  P(d\w) 
is  thus  the  maximum  likelihood  estimate  of  word 
w  in  document  d,  computed  using  the  language 
model  of  document  d  in  the  target  collection  C. 
Likewise,  P(d\Q)  is  the  basic  language  modeling 
retrieval  model,  also  known  as  query  likelihood, 
and  can  also  be  formally  written  as  P(d\Q)  = 
P(d\Q)-  We  tackle  the  null  probabilities 
problem  with  the  standard  Dirichlet  smoothing 
since  it  is  more  convenient  for  keyword  queries 
(as  opposed  to  verbose  queries)  (Zhai  and  Lafferty, 
2004),  which  is  the  case  here.  We  fix  the  Dirich¬ 
let  prior  parameter  to  1500  and  do  not  change  it 
at  any  time  during  our  experiments.  However  it 
is  important  to  note  that  this  model  is  generic, 
and  that  the  word  matching  function  could  be  en¬ 
tirely  substituted  by  other  state-of-the-art  match¬ 
ing  function  (like  BM25  (Robertson  and  Walker, 
1994)  or  information-based  models  (Cline hant  and 
Gaussier,  2010))  without  changing  the  effects  of 
our  latent  concept  modeling  approach  on  docu¬ 
ment  ranking. 

3  General  Sources  of  Information 

The  approach  described  in  the  previous  section  re¬ 
quires  a  source  of  information  from  which  the  con¬ 
cepts  could  be  extracted.  This  source  of  informa¬ 
tion  can  come  from  the  target  collection,  like  in 
traditional  relevance  feedback  approaches,  or  from 
an  external  collection.  In  this  work  we  use  a  set  of 
different  data  sources  that  are  large  enough  to  deal 
with  a  broad  range  of  topics.  Then  we  can  ex¬ 
plore  which  effects  does  the  nature,  the  size  or  the 
quality  of  the  information  source  have  over  latent 
concept  modeling. 


This  set  of  data  sources  is  composed  of  four 
general  resources:  Wikipedia  as  an  encyclopedic 
source,  the  New  York  Times  and  GigaWord  cor¬ 
pora  as  sources  of  news  data  and  the  category  B 
of  the  ClueWeb092  collection  as  a  web  source. 
The  English  GigaWord  LDC  corpus  consists  of 
4,111,240  newswire  articles  collected  from  four 
distinct  international  sources  including  the  New 
York  Times  (Graff  and  Cieri,  2003).  The  New 
York  Times  LDC  corpus  contains  1,855,658  news 
articles  published  between  1987  and  2007  (Sand- 
haus,  2008).  The  Wikipedia  collection  is  a  re¬ 
cent  dump  from  July  2011  of  the  online  encyclo¬ 
pedia  that  contains  3,214,014  documents3.  We  re¬ 
moved  the  spammed  documents  from  the  category 
B  of  the  ClueWeb09  according  to  a  standard  list 
of  spams  for  this  collection4.  We  followed  authors 
recommendations  (Cormack  et  al.,  2011)  and  set 
the  ’’spamminess”  threshold  parameter  to  70.  The 
resulting  corpus  is  composed  of  29,038,220  web 
pages. 


Resource 

#  documents 

#  unique  words 

#  total  words 

NYT 

1.855,658 

1,086,233 

1,378,897,246 

Wiki 

3,214,014 

7,022,226 

1,033,787,926 

GW 

4,111,240 

1,288,389 

1,397,727,483 

Web 

29,038,220 

33,314,740 

22,814,465,842 

Table  1:  Information  about  the  four  general 
sources  of  information  used  in  this  work. 


These  four  resources  are  heterogeneous  in  all 
possible  ways.  They  vary  in  terms  of  vocabulary 
size,  number  of  documents  and,  of  course,  type  of 
information.  We  thus  expect  that  latent  concepts 
will  be  as  diverse  as  the  sources  of  information 
from  which  their  are  modeled. 

4  Experiments 

4.1  Experimental  setup 

We  used  Indri5  for  indexing  and  retrieval.  The 
whole  ClueWeb09  collection  was  stemmed  during 
indexing  with  the  well-known  light  Krovetz  stem- 
mer,  and  stopwords  were  removed  using  the  stan¬ 
dard  english  stoplist  embedded  within  Indri.  We 
also  removed  from  our  index  all  the  documents 

2http : / /boston . lti . cs . emu . edu/ 
cluewebO  9/ 

3http : / / dumps . wikimedia . org/ enwiki/ 
20110722/ 

4http://plg. uwaterloo . ca/ -gvcorraac/ 
cluewebO  9spam/ 

5 http : / /www . lemur pro  ject . org 


that  have  a  spam  percentile  lower  than  70  accord¬ 
ing  to  Waterloo’s  list4.  As  seen  in  Section  2,  con¬ 
cepts  are  composed  of  a  fixed  amount  of  weighted 
words  For  all  our  runs  we  fixed  the  number  of 
words  belonging  to  a  given  concept  to  n  =  10. 

4.2  Runs 

We  submitted  four  runs  in  which  we  explore  the 
influence  of  the  number  of  feedback  documents 
used  for  concept  modeling,  the  concept  weights 
and  combining  the  general  sources  of  information. 

lem-web  This  is  our  reference  run.  It  uses  the 
complete  concept  modeling  approach  described 
in  this  paper,  but  the  feedback  documents  from 
which  the  concepts  arc  modeled  arc  solely  ex¬ 
tracted  from  the  Web  source  of  information  (see 
Section  3). 

lcm-web-noW  This  run  is  the  same  as  above, 
except  that  we  removed  the  concept  weights  (the 
5k s).  The  word  weights  (the  (/i/.s)  arc  still  present 
in  the  ranking  function. 

lcm-web-lOp  This  run  is  identical  to  lem-web, 
except  that  we  fix  the  number  of  feedback  docu¬ 
ments  to  M  =  10. 

lcm-4res  This  last  run  uses  our  concept  model¬ 
ing  on  the  four  general  sources  of  information  pre¬ 
sented  earlier.  The  concept  models  issued  from  the 
different  sources  arc  combined  in  the  final  docu¬ 
ment  ranking  function: 

SAres(Q,d)  =  P(d\Q)  + 

TW  ^ k  ^2  & k’w  '  P(d\w) 

a&S  k^Tk  M(a)  w€ Wk 

where  S  is  the  set  of  sources  of  information  and 
T km (a)  is  concept  model  composed  of  K 

concepts  modeled  from  M  feedback  documents 
which  were  extracted  from  a  source  a. 

4.3  Results 

We  report  in  this  section  the  results  of  our  runs 
for  both  the  Ad  Hoc  (Table  2)  and  the  diversity 
metrics  (Table  3).  We  also  present  the  results  of 
a  standard  competitive  baseline,  the  Markov  Ran¬ 
dom  Field  for  IR  (Metzler  and  Croft,  2005),  as  a 
mean  of  comparison.  We  chose  the  Sequential  De- 
pendance  Model  instantiation  of  this  model  and  set 
the  various  weights  as  recommended  by  the  au¬ 
thors  (A t  =  0.85,  A o  =  0.1  and  A jj  =  0.05). 


This  baseline  showed  to  be  highly  effective  in  pre¬ 
vious  TREC  tracks,  and  especially  in  those  in¬ 
volving  web  documents.  For  both  table  of  results, 
we  use  two  sided  paired  wise  t-test  to  determine 
statistically  significant  differences  with  MRF-IR 
(*  :  p  <  0.1;  **  :p<  0.05;  ***  :  p  <  0.01). 


Run 

ERR  @20 

nDCG@20 

MRF-IR 

0.1038 

0.1041 

lem-web 

0.1334** 

0.1306** 

lcm-web-noW 

0.1352** 

0.1337* 

lcm-web-lOp 

0.1364*** 

0.1339* 

lcm-4res 

0.1428*** 

0.1401*** 

Table  2:  Ad  Hoc  results  for  our  four  submitted 
runs. 

Although  there  is  not  much  difference  in  av¬ 
eraged  scores  between  our  four  runs,  we  see 
that  lcm-4res  achieves  highly  significant  improve¬ 
ments  over  the  MRF-IR  baseline.  More,  the  three 
other  runs  fail  to  retrieve  any  relevant  document 
in  the  top  20  ranks  (ERR@20  =  nDCG@20  =  0) 
for  13  topics,  while  the  lcm-4res  approach  only 
fails  for  9  topics.  It  is  however  interesting  to  note 
that  MRF-IR  fails  on  the  same  topics  as  our  Fa- 
tent  Concept  Modeling  (FCM)  approaches.  It  may 
be  an  language  modeling  issue,  and  it  may  be  in¬ 
teresting  to  compare  with  other  participants  that 
explored  other  retrieval  models.  The  indexing  of 
only  non-spammed  documents  could  also  be  an 
explanation  and  needs  further  exploration. 

When  looking  at  runs  individually,  fixing  the 
number  of  feedback  documents  to  10  achieves 
better  results  on  average  than  using  an  adaptive 
method.  Despite  improvements  of  lcm-web-lOp 
over  MRF-IR  are  less  significant  than  lem-web  for 
nDCG@20,  the  gain  in  computation  time  seems  to 
be  worth  fixing  M. 


Run 

ERR-IA@20 

a-nDCG@20 

P-IA@20 

MRF-IR 

0.2662 

0.3653 

0.1955 

lem-web 

0.3166* 

0.4160** 

0.2501*** 

lcm-web-lOp 

0.3110* 

0.4115* 

0.2427*** 

lcm-4res 

0.3176** 

0.4240*** 

0.2479*** 

lcm-web-noW 

0.3205* 

0.4194** 

0.2503*** 

Table  3:  Diversity  results  for  our  four  submitted 
runs. 


As  for  the  diversity,  removing  the  concept 
weights  seems  to  improve  the  results  on  average, 


however  lcm-4res  achieves  again  higher  statisti¬ 
cally  significant  improvements  than  the  other  runs. 
It  also  reduces  the  number  of  topic  failures  to  only 
one  compared  to  4  for  the  other  runs  and  5  to 
MRF-IR. 

Overall,  the  influence  of  concept  weighting  is 
rather  low.  When  comparing  results  topic  per  topic 
between  lem-web  and  lcm-web-noW,  we  see  no 
significant  differences.  This  is  certainly  due  to  the 
fact  that  all  the  concepts  refer  to  common  themat- 
ics  and  share  the  same  vocabulary.  Plus,  using 
a  small  amount  of  feedback  documents  leads  to 
computing  FDA  in  a  reduced  probabilistic  space. 
Hence,  some  very  important  words  w.r.t  to  the 
query  arc  present  in  every  concept,  thus  diminish¬ 
ing  the  effect  and  the  interest  of  concept  weight¬ 
ing. 

5  Conclusion 

This  paper  detailed  the  run  we  submitted  to  the 
TREC  2012  Web  track.  Our  approach  was  to 
model  the  latent  concepts  that  arc  underlying  an 
information  need.  The  goal  was  to  broaden  the 
scope  of  the  search  and  ultimately  promoting  re¬ 
trieval  diversity,  without  hurting  topical  relevance. 

Official  results  suggest  that  our  approach  works 
quite  well  for  both  ad  hoc  and  diversity  metrics. 
The  use  of  several  sources  of  information  (instead 
of  sticking  to  the  target  collection)  is  found  useful 
in  this  context. 
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